346 research outputs found

    Feature weighting techniques for CBR in software effort estimation studies: A review and empirical evaluation

    Get PDF
    Context : Software effort estimation is one of the most important activities in the software development process. Unfortunately, estimates are often substantially wrong. Numerous estimation methods have been proposed including Case-based Reasoning (CBR). In order to improve CBR estimation accuracy, many researchers have proposed feature weighting techniques (FWT). Objective: Our purpose is to systematically review the empirical evidence to determine whether FWT leads to improved predictions. In addition we evaluate these techniques from the perspectives of (i) approach (ii) strengths and weaknesses (iii) performance and (iv) experimental evaluation approach including the data sets used. Method: We conducted a systematic literature review of published, refereed primary studies on FWT (2000-2014). Results: We identified 19 relevant primary studies. These reported a range of different techniques. 17 out of 19 make benchmark comparisons with standard CBR and 16 out of 17 studies report improved accuracy. Using a one-sample sign test this positive impact is significant (p = 0:0003). Conclusion: The actionable conclusion from this study is that our review of all relevant empirical evidence supports the use of FWTs and we recommend that researchers and practitioners give serious consideration to their adoption

    Empirical Evaluation of the Difficulty of Finding a Good Value of k for the Nearest Neighbor

    Get PDF
    As an analysis of the classification accuracy bound for the Nearest Neighbor technique, in this work we have studied if it is possible to find a good value of the parmeter k for each example according to their attribute values. Or at least, if there is a pattern for the parameter k in the original search space. We have carried out different approaches based onthe Nearest Neighbor technique and calculated the prediction accuracy for a group of databases from the UCI repository. Based on the experimental results of our study, we can state that, in general, it is not possible to know a priori a specific value of k to correctly classify an unseen example

    Preceding rule induction with instance reduction methods

    Get PDF
    A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy

    Robust Machine Learning Applied to Astronomical Datasets III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX

    Full text link
    We apply machine learning in the form of a nearest neighbor instance-based algorithm (NN) to generate full photometric redshift probability density functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky Survey (SDSS DR5). We use a conceptually simple but novel application of NN to generate the PDFs - perturbing the object colors by their measurement error - and using the resulting instances of nearest neighbor distributions to generate numerous individual redshifts. When the redshifts are compared to existing SDSS spectroscopic data, we find that the mean value of each PDF has a dispersion between the photometric and spectroscopic redshift consistent with other machine learning techniques, being sigma = 0.0207 +/- 0.0001 for main sample galaxies to r < 17.77 mag, sigma = 0.0243 +/- 0.0002 for luminous red galaxies to r < ~19.2 mag, and sigma = 0.343 +/- 0.005 for quasars to i < 20.3 mag. The PDFs allow the selection of subsets with improved statistics. For quasars, the improvement is dramatic: for those with a single peak in their probability distribution, the dispersion is reduced from 0.343 to sigma = 0.117 +/- 0.010, and the photometric redshift is within 0.3 of the spectroscopic redshift for 99.3 +/- 0.1% of the objects. Thus, for this optical quasar sample, we can virtually eliminate 'catastrophic' photometric redshift estimates. In addition to the SDSS sample, we incorporate ultraviolet photometry from the Third Data Release of the Galaxy Evolution Explorer All-Sky Imaging Survey (GALEX AIS GR3) to create PDFs for objects seen in both surveys. For quasars, the increased coverage of the observed frame UV of the SED results in significant improvement over the full SDSS sample, with sigma = 0.234 +/- 0.010. We demonstrate that this improvement is genuine. [Abridged]Comment: Accepted to ApJ, 10 pages, 12 figures, uses emulateapj.cl

    Depletion of homeostatic antibodies against malondialdehyde-modified low-density lipoprotein correlates with adverse events in major vascular surgery

    Get PDF
    We aimed to investigate if major vascular surgery induces LDL oxidation, and whether circulating antibodies against malondialdehyde-modified LDL (MDA-LDL) alter dynamically in this setting. We also questioned relationships between these biomarkers and post-operative cardiovascular events. Major surgery can induce an oxidative stress response. However, the role of the humoral immune system in clearance of oxidized LDL following such an insult is unknown. Plasma samples were obtained from a prospective cohort of 131 patients undergoing major non-cardiac vascular surgery, with samples obtained preoperatively and at 24- and 72 h postoperatively. Enzyme-linked immunoassays were developed to assess MDA-LDL-related antibodies and complexes. Adverse events were myocardial infarction (primary outcome), and a composite of unstable angina, stroke and all-cause mortality (secondary outcome). MDA-LDL significantly increased at 24 h post-operatively (p < 0.0001). Conversely, levels of IgG and IgM anti-MDA-LDL, as well as IgG/IgM-MDA-LDL complexes and total IgG/IgM, were significantly lower at 24 h (each p < 0.0001). A smaller decrease in IgG anti-MDA-LDL related to combined clinical adverse events in a post hoc analysis, withstanding adjustment for age, sex, and total IgG (OR 0.13, 95% CI [0.03–0.5], p < 0.001; p value for trend <0.001). Major vascular surgery resulted in an increase in plasma MDA-LDL, in parallel with a decrease in antibody/complex levels, likely due to antibody binding and subsequent removal from the circulation. Our study provides novel insight into the role of the immune system during the oxidative stress of major surgery, and suggests a homeostatic clearance role for IgG antibodies, with greater reduction relating to downstream adverse events

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Do you really follow them? Automatic detection of credulous Twitter users

    Full text link
    Online Social Media represent a pervasive source of information able to reach a huge audience. Sadly, recent studies show how online social bots (automated, often malicious accounts, populating social networks and mimicking genuine users) are able to amplify the dissemination of (fake) information by orders of magnitude. Using Twitter as a benchmark, in this work we focus on what we define credulous users, i.e., human-operated accounts with a high percentage of bots among their followings. Being more exposed to the harmful activities of social bots, credulous users may run the risk of being more influenced than other users; even worse, although unknowingly, they could become spreaders of misleading information (e.g., by retweeting bots). We design and develop a supervised classifier to automatically recognize credulous users. The best tested configuration achieves an accuracy of 93.27% and AUC-ROC of 0.93, thus leading to positive and encouraging results.Comment: 8 pages, 2 tables. Accepted for publication at IDEAL 2019 (20th International Conference on Intelligent Data Engineering and Automated Learning, Manchester, UK, 14-16 November, 2019). The present version is the accepted version, and it is not the final published versio

    Lectin-like bacteriocins from pseudomonas spp. utilise D-rhamnose containing lipopolysaccharide as a cellular receptor

    Get PDF
    Lectin-like bacteriocins consist of tandem monocot mannose-binding domains and display a genus-specific killing activity. Here we show that pyocin L1, a novel member of this family from Pseudomonas aeruginosa, targets susceptible strains of this species through recognition of the common polysaccharide antigen (CPA) of P. aeruginosa lipopolysaccharide that is predominantly a homopolymer of d-rhamnose. Structural and biophysical analyses show that recognition of CPA occurs through the C-terminal carbohydrate-binding domain of pyocin L1 and that this interaction is a prerequisite for bactericidal activity. Further to this, we show that the previously described lectin-like bacteriocin putidacin L1 shows a similar carbohydrate-binding specificity, indicating that oligosaccharides containing d-rhamnose and not d-mannose, as was previously thought, are the physiologically relevant ligands for this group of bacteriocins. The widespread inclusion of d-rhamnose in the lipopolysaccharide of members of the genus Pseudomonas explains the unusual genus-specific activity of the lectin-like bacteriocins
    • …
    corecore